106 research outputs found
Entity Synonym Discovery via Multipiece Bilateral Context Matching
Being able to automatically discover synonymous entities in an open-world
setting benefits various tasks such as entity disambiguation or knowledge graph
canonicalization. Existing works either only utilize entity features, or rely
on structured annotations from a single piece of context where the entity is
mentioned. To leverage diverse contexts where entities are mentioned, in this
paper, we generalize the distributional hypothesis to a multi-context setting
and propose a synonym discovery framework that detects entity synonyms from
free-text corpora with considerations on effectiveness and robustness. As one
of the key components in synonym discovery, we introduce a neural network model
SYNONYMNET to determine whether or not two given entities are synonym with each
other. Instead of using entities features, SYNONYMNET makes use of multiple
pieces of contexts in which the entity is mentioned, and compares the
context-level similarity via a bilateral matching schema. Experimental results
demonstrate that the proposed model is able to detect synonym sets that are not
observed during training on both generic and domain-specific datasets:
Wiki+Freebase, PubMed+UMLS, and MedBook+MKG, with up to 4.16% improvement in
terms of Area Under the Curve and 3.19% in terms of Mean Average Precision
compared to the best baseline method.Comment: In IJCAI 2020 as a long paper. Code and data are available at
https://github.com/czhang99/SynonymNe
TEST: Text Prototype Aligned Embedding to Activate LLM's Ability for Time Series
This work summarizes two strategies for completing time-series (TS) tasks
using today's language model (LLM): LLM-for-TS, design and train a fundamental
large model for TS data; TS-for-LLM, enable the pre-trained LLM to handle TS
data. Considering the insufficient data accumulation, limited resources, and
semantic context requirements, this work focuses on TS-for-LLM methods, where
we aim to activate LLM's ability for TS data by designing a TS embedding method
suitable for LLM. The proposed method is named TEST. It first tokenizes TS,
builds an encoder to embed them by instance-wise, feature-wise, and
text-prototype-aligned contrast, and then creates prompts to make LLM more open
to embeddings, and finally implements TS tasks. Experiments are carried out on
TS classification and forecasting tasks using 8 LLMs with different structures
and sizes. Although its results cannot significantly outperform the current
SOTA models customized for TS tasks, by treating LLM as the pattern machine, it
can endow LLM's ability to process TS data without compromising the language
ability. This paper is intended to serve as a foundational work that will
inspire further research.Comment: 10 pages, 6 figure
MedTruth: A Semi-supervised Approach to Discovering Knowledge Condition Information from Multi-Source Medical Data
Knowledge Graph (KG) contains entities and the relations between entities.
Due to its representation ability, KG has been successfully applied to support
many medical/healthcare tasks. However, in the medical domain, knowledge holds
under certain conditions. For example, symptom \emph{runny nose} highly
indicates the existence of disease \emph{whooping cough} when the patient is a
baby rather than the people at other ages. Such conditions for medical
knowledge are crucial for decision-making in various medical applications,
which is missing in existing medical KGs. In this paper, we aim to discovery
medical knowledge conditions from texts to enrich KGs.
Electronic Medical Records (EMRs) are systematized collection of clinical
data and contain detailed information about patients, thus EMRs can be a good
resource to discover medical knowledge conditions. Unfortunately, the amount of
available EMRs is limited due to reasons such as regularization. Meanwhile, a
large amount of medical question answering (QA) data is available, which can
greatly help the studied task. However, the quality of medical QA data is quite
diverse, which may degrade the quality of the discovered medical knowledge
conditions. In the light of these challenges, we propose a new truth discovery
method, MedTruth, for medical knowledge condition discovery, which incorporates
prior source quality information into the source reliability estimation
procedure, and also utilizes the knowledge triple information for trustworthy
information computation. We conduct series of experiments on real-world medical
datasets to demonstrate that the proposed method can discover meaningful and
accurate conditions for medical knowledge by leveraging both EMR and QA data.
Further, the proposed method is tested on synthetic datasets to validate its
effectiveness under various scenarios.Comment: Accepted as CIKM2019 long pape
- …